在过去的十年中,强化学习成功地解决了复杂的控制任务和决策问题,例如Go棋盘游戏。然而,在将这些算法部署到现实世界情景方面的成功案例很少。原因之一是在处理和避免不安全状态时缺乏保证,这是关键控制工程系统的基本要求。在本文中,我们介绍了指导性的安全射击(GUS),这是一种基于模型的RL方法,可以学会以最小的侵犯安全限制来控制系统。该模型以迭代批次方式在系统操作过程中收集的数据中学习,然后用于计划在每个时间步骤执行的最佳动作。我们提出了三个不同的安全计划者,一个基于简单的随机拍摄策略,两个基于MAP-ELITE,一种更高级的发散搜索算法。实验表明,这些计划者可以帮助学习代理避免在最大程度地探索状态空间的同时避免不安全的情况,这是学习系统准确模型的必要方面。此外,与无模型方法相比,学习模型可以减少与现实系统的交互作用的数量,同时仍达到高奖励,这是处理工程系统时的基本要求。
translated by 谷歌翻译
To apply federated learning to drug discovery we developed a novel platform in the context of European Innovative Medicines Initiative (IMI) project MELLODDY (grant n{\deg}831472), which was comprised of 10 pharmaceutical companies, academic research labs, large industrial companies and startups. The MELLODDY platform was the first industry-scale platform to enable the creation of a global federated model for drug discovery without sharing the confidential data sets of the individual partners. The federated model was trained on the platform by aggregating the gradients of all contributing partners in a cryptographic, secure way following each training iteration. The platform was deployed on an Amazon Web Services (AWS) multi-account architecture running Kubernetes clusters in private subnets. Organisationally, the roles of the different partners were codified as different rights and permissions on the platform and administrated in a decentralized way. The MELLODDY platform generated new scientific discoveries which are described in a companion paper.
translated by 谷歌翻译
本文介绍了一种基于输入输出对的有限样本的有限带限制函数构建置信带的方法。该方法是免费的W.R.T.假定观察噪声,并且仅假定输入分布的知识。它是非参数,也就是说,它不需要回归函数的参数模型,并且区域具有非征收保证。该算法基于paley-fiener的理论,再现了内核希尔伯特空间。本文首先研究了完全可观察到的变体,当观测值没有噪音并且只有输入是随机的。然后,它使用梯度扰动方法将思想概括为嘈杂的情况。最后,提出了证明两种情况的数值实验。
translated by 谷歌翻译